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Abstract 

Background: Numerous genetic and genomic datasets related to complex diseases have been made available 
during the last decade. It is now a great challenge to assess such heterogeneous datasets to prioritize disease 
genes and perform follow up functional analysis and validation. Among complex disease studies, psychiatric 
disorders such as major depressive disorder (MDD) are especially in need of robust integrative analysis because 
these diseases are more complex than others, with weak genetic factors at various levels, including genetic 
markers, transcription (gene expression), epigenetics (methylation), protein, pathways and networks. 

Results: In this study, we proposed a comprehensive analysis framework at the systems level and demonstrated 
it in MDD using a set of candidate genes that have recently been prioritized based on multiple lines of 
evidence including association, linkage, gene expression (both human and animal studies), regulatory pathway, 
and literature search. In the network analysis, we explored the topological characteristics of these genes in the 
context of the human interactome and compared them with two other complex diseases. The network 
topological features indicated that MDD is similar to schizophrenia compared to cancer. In the functional 
analysis, we performed the gene set enrichment analysis for both Gene Ontology categories and canonical 
pathways. Moreover, we proposed a unique pathway crosstalk approach to examine the dynamic interactions 
among biological pathways. Our pathway enrichment and crosstalk analyses revealed two unique pathway 
interaction modules that were significantly enriched with MDD genes. These two modules are neuro- 
transmission and immune system related, supporting the neuropathology hypothesis of MDD. Finally, we 
constructed a MDD-specific subnetwork, which recruited novel candidate genes with association signals from a 
major MDD GWAS dataset. 

Conclusions: This study is the first systematic network and pathway analysis of candidate genes in MDD, providing 
abundant important information about gene interaction and regulation in a major psychiatric disease. The results 
suggest potential functional components underlying the molecular mechanisms of MDD and, thus, facilitate 
generation of novel hypotheses in this disease. The systems biology based strategy in this study can be applied to 
many other complex diseases. 
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Background 

During the past decade, rapid advances in high through- 
put technologies have helped investigators generate 
numerous genetic and genomic datasets, aiming to 
uncover disease causal genes and their actions in com- 
plex diseases. These datasets are often heterogeneous 
and multi-dimensional; thus, it is difficult to find consis- 
tent genetic signals for the connection to the corre- 
sponding disease. Specifically in psychiatric genetics, 
there have been numerous datasets from different plat- 
forms or sources such as association studies, including 
genome-wide association studies (GWAS), genome-wide 
linkage scans, microarray gene expression, and copy 
number variation, among others. Analyses of these data- 
sets have led to many exciting discoveries, including dis- 
ease susceptibility genes or loci, providing important 
insights into the underlying molecular mechanisms of 
the diseases. However, the results based on single 
domain data analysis are often inconsistent, with a very 
low replication rate in psychiatric disorders [1,2]. It has 
now been commonly accepted that psychiatric disorders, 
such as schizophrenia and major depressive disorder 
(MDD), have been caused by many genes, each of which 
has a weak or moderate risk to the disease [3,4]. Thus, a 
convergent analysis of multi-dimensional datasets to 
prioritize disease candidate genes is urgently needed. 
Such an approach may overcome the limitation of each 
single data type and provide a systematic view of the 
evidence at the genomic, transcriptomic, proteomic, 
metabolomic, and regulatory levels [5,6]. 

Recently, pathway and network-assisted analyses of 
genomic and transcriptomic datasets have been emer- 
ging as powerful approaches to analyze disease genes 
and their biological implications [7-11]. According to 
the observation of "guilt by association", genes with 
similar functions have been demonstrated to interact 
with each other more closely in the protein-protein 
interaction (PPI) networks than those functionally unre- 
lated genes [12]. Similarly, we have seen accumulating 
evidence that complex diseases are caused by functional 
related genes (e.g., in pathways or protein complex) 
through their dynamic interaction and regulation rather 
than action by single gene alone. Taken together, a sys- 
tematic analysis and comparison of disease genes in the 
PPI network would provide additional insights into the 
diseases that otherwise could not be identified by single 
gene or single marker analysis. It is important to note 
that, although network-based analysis has been widely 
applied in major complex diseases such as cancer, its 
application in psychiatric diseases has been limited so 
far. 

MDD is a complex mental disorder with a lifetime 
prevalence of 9-19% [13-15] and moderate heritability 



(37-43%) [16]. Previous studies have suggested the invol- 
vement of polygenic and mutifactorial features in the 
pathology of MDD, as well as complex interactions 
among genes (GxG) and environmental factors (GxE) 
[17,18]. Recently, we have performed the first gene 
prioritization using multi-dimensional evidence-based 
datasets in MDD, including association, linkage, gene 
expression (both human and animal studies), regulatory 
pathway, and literature search (both human and animal 
studies) [19]. A list of depression candidate genes 
(which we named DEPgenes) with high reliability has 
been generated based on this strategy [19]. However, 
several characteristics remain unclear: the functional 
relationships among these DEPgenes, how they interact 
and regulate with each other, and how they act in the 
MDD. Such investigations are warranted for a deeper 
understanding of the molecular mechanisms of MDD 
but require comprehensive analysis at the systems biol- 
ogy level. 

In this study, we first explored DEPgenes in the con- 
text of the PPI network for their topological characteris- 
tics and compared them with two representative 
complex diseases: schizophrenia and cancer. We per- 
formed the functional enrichment analyses using anno- 
tations from both Gene Ontology (GO) [20] and 
canonical pathways. More importantly, we examined 
crosstalk among the significantly enriched pathways by 
quantitatively measuring the shared protein components 
between each pair of pathways. Finally, we constructed a 
MDD-specific subnetwork using the DEPgenes and vali- 
dated them using the association data from an indepen- 
dent GWAS dataset for MDD. Our work demonstrated 
a practical framework for complex disease candidate 
gene analysis at the functional level, which can be 
applied to other complex diseases. 

Materials and methods 

Depression candidate genes 

We modified the scoring scheme in the gene prioritiza- 
tion system proposed by Kao et al [19] and reprioritized 
a list of 151 DEPgenes for MDD using the updated data 
information. Briefly, several lines of evidence-based data- 
sets were collected for MDD, including association stu- 
dies, linkage scans, gene expression (both human and 
animal studies), literature search (both human and ani- 
mal studies), and biological regulatory pathways. A data- 
set-specific score was assigned for each gene in each 
data source, and all data types were combined by an 
optimized weighting matrix to indicate the priority of a 
gene's association with MDD. The final gene list was 
selected based on a set of previously implicated core 
genes for MDD and validated by the GWAS dataset. 
Detailed information of this gene prioritization 
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procedure can be found in Kao et al [19]. Of note, the 
number of genes we used here is slightly different from 
that in Kao et al [19] due to the data and annotation 
updates, but the two lists were very similar. 

Other data sources and process 

For the purpose of comparison, we collected schizo- 
phrenia candidate genes and cancer genes. Schizophre- 
nia is a severe psychiatric disorder and has been 
suggested to share certain comorbidity with MDD 
clinically and genetically [21]. We included this disor- 
der here to represent other psychiatric disorders for 
the purpose of comparison. We retrieved 160 schizo- 
phrenia candidate genes prioritized in our recent work 
using a similar multi-dimensional evidence-based strat- 
egy [22]. Cancer has been the most studied among all 
complex disease and is expected to have substantially 
different pathological features from MDD. Thus, it 
would be interesting to see how those genes act differ- 
ently at the network and pathway levels. Cancer genes 
were downloaded from the Cancer Gene Census data- 
base [23] (CGC, July 2011). 

The human PPI data was downloaded from the Pro- 
tein Interaction Network Analysis (PINA) platform 
(downloaded in March 2010) [24], which collected and 
annotated data from six public PPI databases (MINT, 
IntAct, DIP, BioGRID, HPRD, and MIPS/MPact). Only 
proteins that could be successfully mapped to NCBI 
protein-coding genes were included in our analysis (see 
below). After removing self-interaction and duplicates, 
the final network included a total of 10,377 nodes and 
50,109 interactions. 

The GWAS dataset for major depression (dbGaP 
Study Accession: phs000020.v2.pl) was retrieved 
through our approved access to dbGaP [25]. We devel- 
oped a pipeline for quality controls (QC) to the dataset. 
Detailed information can be found in our previous stu- 
dies [19,26-28]. As a brief summary, there were 1,738 
depression patients and 1,802 matched normal controls, 
and 424,861 markers after QC, covering a total of 
16,758 genes. This dataset was used to evaluate the 
genes identified in this work. 

To coordinate these heterozygous datasets in this 
study, we downloaded several key annotation files from 
the National Center for Biotechnology Information 
(NCBI) [29] for the ease of integration. These included 
the annotation files of Homo_sapiens.gene_info.zip, gen- 
e_refseq_uniprotkb_collab.zip, and gene2refseq.zip (as of 
November 24, 2010). DEPgenes, schizophrenia candidate 
genes, cancer genes, PPI data, and GWAS data were all 
mapped to human protein-coding genes from NCBI. 
Those genes that could not be mapped appropriately 
were discarded from the subsequent analysis. 



Network topological properties 

In network analysis, there are several key topological 
indicators that have been defined to describe the beha- 
viors or characteristics of the nodes in a network. The 
most representative ones are degree, betweenness, and 
shortest path. Degree is defined as the number of adja- 
cent edges of a given node (protein) or the number of 
neighbor nodes interacting with it. Betweenness of a 
node is defined as the number of shortest paths going 
through the node; shortest path measures the nearest 
distance traveling from one node to another. We chose 
to examine the distribution of degree and betweenness 
of DEPgenes for exploration of their topological beha- 
viors, and compared them with those of schizophrenia 
candidate genes [22] and cancer genes [30]. 

Functional enrichment tests 

To perform functional enrichment tests of the candidate 
genes, we used WebGestalt [31] for Gene Ontology 
(GO) term analysis and used the Ingenuity Pathway 
Analysis (IPA) system [32] for both canonical pathways 
and molecular networks. Although WebGestalt can per- 
form enrichment tests for the Kyoto Encyclopedia of 
Genes and Genomes (KEGG) pathways [33], the IPA 
system provides a more comprehensive pathway 
resource based on manual collection and curation. The 
rich information returned by IPA is also suitable for 
pathway crosstalk analysis (see below), as it has more 
molecules and their connections included. Briefly, Web- 
Gestalt implements the hypergeometric test for the 
enrichment of GO terms in the candidate genes, fol- 
lowed by the correction of multiple testing using the 
Benjamini & Hochberg (BH) method [34]. The IPA sys- 
tem implements Fisher's exact test to determine whether 
a canonical pathway is enriched with genes of interest. 
Furthermore, the network analysis in the IPA system 
searches for significant molecular networks in a com- 
mercial knowledge base, including integrative informa- 
tion from literature, gene expression, and gene 
annotation. 

Pathway crosstalk 

We performed pathway crosstalk analysis using the 
pathways that were significantly enriched with DEPgenes 
after multiple testing correction. Two pathways are con- 
sidered to crosstalk if they share a proportion of DEP- 
genes. We introduced two measurements to 
computationally indicate the overlap of a pair of path- 
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denote the number of candidate genes in the two path- 
ways, respectively. To avoid non-specific inclusion of 
crosstalk, we further implemented the following rules: 

(1) only pathways with at least 5 DEPgenes were used; 

(2) only pathways with adjusted P values < 0.01 were 
used; and (3) two pathways in crosstalk were required to 
share at least 3 DEPgenes. These criteria were intro- 
duced to ensure that each of the pathways, as well as its 
crosstalk pair, have not only statistical significance but 
also a biologically meaningful number of genes, as some 
pathways may be too small. Finally, we found many sig- 
nificant pathways were identified by IPA; thus, they gen- 
erated thousands of crosstalk events when all the 
pathway combinations were compared. In practice, we 
chose only those crosstalk events that had scores within 
the top 10% of the score distribution. Although these 
criteria were arbitrary, we found it worked efficiently to 
balance an appropriate number of pathways and cross- 
talk events. 

Construction of MDD-specific subnetwork 

To construct a MDD-specific subnetwork, we applied 
the Steiner minimum tree algorithm that is implemen- 
ted in our software framework GenRev [35] to the 151 
DEPgenes. Solving the Steiner minimum tree algorithm 
was proposed by Klein and Ravi [36], which can be used 
for constructing a connected subnetwork given a list of 
query nodes. In our case, the query nodes are those 
encoded by DEPgenes, and the whole network is the 
human interactome extracted from the PINA database 
(see above). This algorithm aims to connect a maximum 
proportion of the query nodes. To accomplish this, addi- 
tional nodes in the network, but not in the query list, 
would be recruited in order to make the target subnet- 
work interconnected, while the algorithm is optimized 
towards a minimum list of the additional nodes. GenRev 
is a recently developed software tool which implements 
the Steiner minimum tree algorithm, as well as two 
other popular algorithms for subnetwork construction. 
It has been successfully applied in our previous work 
[6,22,37]. In the work discussed here, we used it for 
DEPgenes to construct MDD-specific subnetwork. 

Results 

Network topological properties of depression genes 

We collected 151 major depressive disorder candidate 
genes (DEPgenes). Among them, 134 had protein inter- 
action annotations in the human interactome. Figure 1 
shows the degree distribution. The average degree of 
these proteins was 18.55, and their median degree value 
was 6. As a comparison, the average degree was 14.75 
(median value 6) for the schizophrenia candidate genes 
(131 of the 160 genes mapped onto the human interac- 
tome) and 25.53 (median value 12) for the cancer genes 



(353 of the 459 genes mapped onto the interactome). 
Overall, although DEPgenes on average had a higher 
degree value than schizophrenia genes, their degree dis- 
tribution is similar to that of schizophrenia genes, and 
statistical tests indicated no significant difference (Wil- 
coxon test, P = 0.53). However, we observed different 
degree distributions between DEPgenes and cancer 
genes, and statistical tests indicated that DEPgenes had 
significantly lower degrees than cancer genes (P = 1.93 
x 10" 5 ). Specifically, cancer genes were found more fre- 
quently in the degree bins 18-32 and 32-40 (Figure 1). 

For the measurement of betweenness, the average 
value was 5.02 x 10 4 for DEPgenes, 4.01 x 10 4 for the 
schizophrenia genes, and 5.61 x 10 4 for cancer genes, 
while their median values were 5.12 x 10 3 , 3.54 x 10 3 , 
and 1.02 x 10 4 , respectively. Similar to the measurement 
of degree, there was no significant difference in the 
betweenness values between the MDD and schizophre- 
nia candidate genes (P = 0.21), but cancer genes had sig- 
nificantly larger betweenness values than DEPgenes (P - 
0.03). These results indicated that the candidate genes 
for the two major psychiatric disorders, MDD and schi- 
zophrenia, shared similar topological features in the 
human interactome, while both had substantially differ- 
ent features when compared to cancer genes. 

Gene Ontology enrichment analysis by WebGestalt 

To explore whether DEPgenes share specific functional 
features, we performed GO enrichment analysis using 
WebGestalt (version 2.0). We found that many neurode- 
velopment related functions and biological processes 
were significantly enriched in DEPgenes, regardless of 
GO terms categories (BP: biological process; MF: mole- 
cular function; and CC: cellular component) (Table 1). 
The most significant terms in each of these three GO 
categories are: synaptic transmission in biological pro- 
cess (Pbh = 1.18 x 10" 34 ), G-protein coupled amine 
receptor activity in molecular function (P B h = 7.18 x 10" 
19 ), and neuron projection in cellular component (P BH = 
3.91 x 10" 20 ). Other enriched GO terms of interest 
include transmission of nerve impulse, neurological pro- 
cess, cell communication, dopamine binding, extracellu- 
lar ligand-gated ion channel activity, ligand-gated 
channel activity, axon, and dendrite. 

Pathway enrichment by Ingenuity Pathway Analysis 

We then examined whether DEPgenes are enriched in 
canonical pathways by performing Fishers exact test in 
the IPA system. Table 2 shows the 12 most significantly 
enriched pathways. Remarkably, most of them are 
related to the neurotransmission system, supporting the 
neuropathology hypothesis of MDD (Table 2). Among 
them, we highlighted serotonin receptor signaling, dopa- 
mine receptor signaling, PXR/RXR activation, 
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Figure 1 Comparison of degree distribution of major depressive disorder (MDD), schizophrenia (SCZ), and cancer genes. The disease 
genes were grouped by their degree into degree bins. Here, degree was measure by the number of interactors for each disease gene in the 
human interactome. The top panel shows the histogram degree distribution, and the bottom panel shows the curve degree distribution. In the 
bottom panel, each vertical line represents the median value of the degrees in each disease category. Note that MDD and SCZ candidate genes 
had the same median value of degrees so that their vertical lines could not be distinguished. 



neuropathic pain signaling on dorsal horn neurons, 
CREB signaling in neurons and tryptophan metabolism. 
This result is consistent with prior knowledge of MDD 
[38,39], providing further evidence of the neuro-related 
processes in this disorder. 

Crosstalk among significantly enriched pathways 

Since many genes and pathways might be involved in 
MDD, to more deeply understand how these pathways 
are related, we performed a pathway crosstalk analysis. 
We first selected the significantly enriched pathways 
from the IPA results. Specifically, we selected those 
pathways having P BH < 0.01 and > 5 DEPgenes. There 
were 71 pathways that met these criteria. Among them, 
69 pathways shared at least 3 genes with other path- 
ways. A total of 571 edges (links) connected between 



any two of these pathways, and these edges were ranked 
according to the average scores of the Jaccard Coeffi- 
cient and the Overlap Coefficient (see the Materials and 
methods section). We selected the top 10% edges, which 
resulted in 57 pairs of pathway crosstalk, and con- 
structed the pathway crosstalk network for MDD. This 
pathway crosstalk was the first of its kind in MDD. 

Graphical presentation of the selected pathway cross- 
talk revealed two self-clustered modules, as well as small 
but strongly-linked pathway pairs. As shown in Figure 2, 
the two large modules are dominated by neuro-related 
signal transduction and immune related pathways, 
respectively. The neuro-related signal transduction mod- 
ule consists of the calcium signaling pathway, synaptic 
long term potentiation, CREB signaling in neurons, axo- 
nal guidance signaling, and others. These pathways have 
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Table 1 Gene Ontology (GO) terms enriched with module genes (GO level > 4) 



GO terms 



Observed* 



Biological process 

GO:0007268: synaptic transmission 

GO:0007267: cell-cell signaling 

GO:0019226: transmission of nerve impulse 

GO:0044057: regulation of system process 

GO:0051239: regulation of multicellular organismal process 

GO:0050877: neurological system process 

GO:0007154: cell communication 

Molecular function 

GO:0008227: G-protein coupled amine receptor activity 
GO:0035240: dopamine binding 

GO:0005230: extracellular ligand-gated ion channel activity 

GO:0004888: transmembrane receptor activity 

GO:0005102: receptor binding 

GO:0022834: ligand-gated channel activity 

Cellular component 

GO:0043005: neuron projection 

GO:0044459: plasma membrane part 

GO:0000267: cell fraction 

GO:0005887: integral to plasma membrane 

GO:0031226: intrinsic to plasma membrane 

GO:0042995: cell projection 

GO:0005886: plasma membrane 

GO:0030424: axon 

GO:0030425: dendrite 

GO:0005626: insoluble fraction 



45 
56 
46 
36 
55 
59 
103 

15 
7 

12 
40 
31 
13 

30 
61 
45 
47 
47 
35 
80 
18 
17 
32 



7.18 x 
8.78 x 
2.84 x 
1.23 x 
1.07 x 
4.72 x 

6.14 x 

5.13 x 
2.25 x 
4.17 x 
6.55 x 
9.87 x 
2.46 x 

2.43 x 
3.12 x 
1.45 x 
5.84 x 
1.27 x 
5.50 x 
1.64 x 

1.15 x 
2.76 x 
7.11 x 



10" 21 

10" 13 

10" 12 

10" 12 

10" 11 
1Q -io 

10" 22 

1Q -20 

10" 19 
10" 19 

1Q -18 



18 



1.18 x 10" i4 
7.21 x 10~ 34 
1.55 x 10~ 33 
2.89 x 10" 27 
2.89 x 10" 27 
9.69 x 10" 24 
1.12 x 10" 23 



7.18 x 10~ 19 

1.58 x 10~ 11 

2.34 x 10" 10 
2.78 x 10" 10 
3.07 x 10~ 9 
5.74 x 10~ 9 

3.91 x 10" 20 

2.51 x 10~ 18 

7.78 x 10" 18 

2.35 x 10" 17 
4.09 x 10" 17 
1.48 x 10~ 16 
3.77 x 10~ 16 
2.31 x 10~ 14 
4.94 x 10" 13 
1.14 x 10~ 11 



*Number of the observed DEPgenes in the category. 

$ P values were adjusted by Benjamini & Hochberg (BH) method [34]. 



Table 2 Canonical pathways enriched with module genes 
by Ingenuity Pathway Analysis (IPA) (P B h < 10" 6 ) 



Ingenuity canonical pathways Observed* P B h $ 



cAMP-mediated signaling 


23 


6.31 x 10~ 16 


G-protein coupled receptor signaling 


31 


5.01 x 10~ 15 


Serotonin receptor signaling 


12 


6.31 x 10~ 15 


Corticotropin releasing hormone signaling 


14 


7.94 x 10" 11 


Dopamine receptor signaling 


11 


3.80 x 10~ 9 


Glucocorticoid receptor signaling 


17 


1.35 x 10~ 8 


PXR/RXR activation 


10 


2.29 x 10~ 8 


Amyotrophic lateral sclerosis signaling 


11 


5.89 x 10~ 8 


Neuropathic pain signaling on dorsal horn 


11 


5.89 x 10~ 8 


neurons 






Relaxin signaling 


12 


1.02 x 10~ 7 


CREB signaling in neurons 


13 


1.62 x 10~ 7 


Tryptophan metabolism 


11 


6.61 x 10~ 7 



*Number of the observed DEPgenes in the category. 

$ P values were adjusted by Benjamini & Hochberg (BH) method [34]. 



been well studied and have been indicated in many psy- 
chiatric disorders that share comorbidity with MDD 
[5,40,41]. 

The second large pathway crosstalk module mainly 
consisted of immune-related pathways, such as IL-6 sig- 
naling and LXR/RXR activation (Figure 2). This strongly 
supports recent discoveries of immunity and inflamma- 
tion related processes in psychiatric disorders [42,43], 
including MDD [44]. Many genes that drove the cross- 
talk in the figure were also found to function in both 
neuro- and immune-related processes like APOE [45], 
TNF [46], and 116 [47]. 

Molecular subnetwork 

A total of 8 significant molecular networks were identi- 
fied by Fisher's exact test in the IPA system with addi- 
tional criteria specifying that a pathway's score was at 
least 10 and each pathway had at least 10 DEPgenes. 
Here, score was transformed from -logP, where P is cal- 
culated by the Fisher's exact test. Figure 3 showed the 
two most significant networks, in which DEPgenes were 
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Figure 2 Pathway crosstalk and functional map of DEPgenes (major depressive disorder genes) In this figure, each node represents a 
significant pathway, and each edge represents a pathway crosstalk, i.e., a significant overlap of the component genes between two linked 
pathways. The color of each node is approximately proportional to the adjusted P (P B h) value of the corresponding pathway in the pathway 
enrichment analysis by Ingenuity Pathway Analysis (IPA). Darker color indicates lower P BH value. The size of each node is approximately 
proportional to the number of DEPgenes found in the corresponding pathway. The width of each edge is approximately proportional to the 
overlap score of the related pathways (see Materials and methods). 



highlighted in red. In the first network (Figure 3A), we 
observed 18 DEPgenes, and the top functions of this 
network included energy production, drug metabolism, 
and small molecule biochemistry. The second network, 
which consisted of 18 DEPgenes too, was enriched with 
the functions of genetic disorder, neurological disease, 
and psychological disorders. On the molecular level, we 
observed a group of serotonin receptors and G-proteins 
(Figure 3), further supporting the involvement of neuro- 
logical signaling in major depressive disorder. 

MDD-specific subnetwork 

Among the 151 DEPgenes, 134 were found to have PPI 
annotations in the human interactome. Using our 



recently developed subnetwork extraction tool GenRev, 
we successfully constructed a MDD-specific subnetwork. 
The subnetwork contained 130 DEPgenes and 62 addi- 
tional genes that were recruited via the subnetwork con- 
struction algorithm (Steiner minimum tree algorithm 
[36]) (Figure 4). To evaluate the genes identified in the 
subnetwork, we compared their P values in a GWAS 
dataset for MDD (see the Materials and methods sec- 
tion). Among the 16,758 genes in the MDD GWAS 
dataset, we had 122 DEPgenes in the subnetwork, 56 
non-DEPgenes in the subnetwork (we named them sub- 
network's recruited genes), and remaining 16,580 genes 
outside of the subnetwork. For each gene, we assigned a 
gene-wise P value based on the SNP that had the 



Jia et al. BMC Systems Biology 201 1, 5(Suppl 3):S12 
http://www.biomedcentral.eom/1 752-0509/5/S3/S1 2 



Page 8 of 1 3 




smallest P value among all the SNPs mapped to the 
gene region [4,26]. When we separated gene-wise P 
values into four bins (<0.001, 0.001-0.01, 0.01-0.05, and 
>0.05), we found both the DEPgenes and the newly 
recruited genes in the subnetwork were more frequent 
in the small P value bins (<0.001, 0.001-0.01, 0.01-0.05) 
than other genes (Figure 5). Furthermore, DEPgenes 
tended to have smaller gene-wise P values than the 
newly recruited genes, supporting that subnetwork ana- 
lysis could identify potential disease genes that would 
otherwise unlikely be detected by traditional singe gene 
or single marker association studies. When using cutoff 
value 0.05 to separate the genes into three gene sets (i. 
e., nominally significant genes were defined as those 
with gene -wise P value < 0.05), we found that the DEP- 
genes in the subnetwork had a significantly larger pro- 
portion of nominally significant genes in the GWAS 
dataset (Fishers exact test, P = 4.13 x 10" 4 ) compared to 
the remaining genes. The recruited genes in the subnet- 
work were found to have a similar trend of larger pro- 
portion of nominally significant genes than remaining 
genes, but this difference was not significant (P = 0.10). 
Of note, when comparing the genes in the MDD-speci- 
fic subnetwork (122+56 = 178 genes) with those outside 
of the network (16,580 genes), the subnetwork genes 



had significantly more nominally significant genes (P = 
1.81 x 10" 4 ). 

Discussion 

Although there have been numerous reports of suscept- 
ibility genes or loci to psychiatric disorders such as 
major depressive disorder and schizophrenia, no disease 
causal genes have been confirmed [48-50]. One impor- 
tant task now is to reduce the data noise and prioritize 
the candidate genes from multiple dimensional genetic 
and genomic datasets that have been made available 
during the last decade and then explore their functional 
relationships for further validation. To our knowledge, 
this is the first systematic network and pathway analysis 
for MDD using candidate genes prioritized from com- 
prehensive evidence-based data sources. By overlaying 
the MDD candidate genes in the context of the human 
interactome, we examined the topological characteristics 
of these genes by comparing them with those of schizo- 
phrenia and cancer candidate genes. We further per- 
formed pathway enrichment analysis to better 
understand the biological implications of these genes in 
the context of the regulatory system. Building on our 
observation of the large number of pathways enriched 
with DEPgenes, we developed novel approaches to 
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proportional to the integrative evidence score of each DEPgene, as described in Kao et al [19]. 



measure pathway crosstalk so that complex gene action 
and regulation could be explored, thus providing us new 
insights into the interpretation of the underlying mole- 
cular mechanisms in MDD. 

Our network topological analysis revealed that DEP- 
genes showed similar topological characteristics to schi- 
zophrenia, supporting previous reports that depression 
and schizophrenia might share comorbidity both clini- 
cally and genetically [21]. For example, clinical symptoms 
such as psychosis and neuro-cognitive impairments have 
been observed in both depression and schizophrenia 
patients [21], and shared genetic variance has been 
reported between major depression and schizophrenia 
[51,52]. Although similar network topological features 
are expected by many investigators, our study was the 



first to confirm, and provided further evidence, that the 
topological features of depression genes are different 
from cancer genes. It is worth noting that, although 
depression and schizophrenia genes had similar degree 
distributions (Figure 1), depression genes had moderately 
stronger connectivity and betweenness than schizophre- 
nia genes. 

Of significance, our pathway crosstalk analysis revealed 
two large clustered modules, both of which had impor- 
tant implications to MDD (Figure 2). The first cluster 
included 17 pathways, and it was dominated by neuro- 
signaling pathways. Among these pathways, neuropathic 
pain signaling in dorsal horn neurons (P B h = 5.89 x 10" 
8 ), CREB signaling in neurons (P B h = 1-62 x 10~ 7 ), synap- 
tic long term potentiation (P BH = 6.17 x 10~ 5 ), and axonal 
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guidance signaling (P B h = 1-55 x 10" 4 ) are involved in 
neuron/brain tissues and have been reported to be 
involved in MDD [53,54]. Our further examination of the 
genes contributing to the crosstalk revealed that the most 
frequently shared genes in this cluster were PRKACA 
(functioning in n = 15 pathways in this cluster), GNAS (n 
= 14), GNB3 (n = 13), ADCY7 (n = 10), GNAL (n = 9), 
AKT1 (n = 9), CREB1 (n = 8), CAMK2A (n = 6), GRIN2B 
(n = 5), GRIN2A (n = 5), and GRIN1 (n = 5), among 
others. 

The second cluster is primarily related to immunity 
and inflammation, including the IL-6 signaling pathway 
(P B h = 6.17 x 10" 3 ), differential regulation of cytokine 
production in macrophages and T helper cells by IL- 
17A and IL-17F (P BH = 8.13 x 10" 6 ), and LXR/RXR acti- 
vation (P B h = 4.57 x 10" 4 ). For example, the LXR/RXR 
pathway may play a role in the prevention of pro- 
grammed cell death and a role in immune responses to 
inhibit inflammatory gene expression [55]. The most 
frequently shared genes in this cluster included TNF 
(functioning in n = 14 pathways), IL6 (n = 13), IL1B 



(n = 13), IL10 (n = 9), CCL2 (n = 8), NGFR (n = 7), and 
AKT1 (n = 7), among others. These genes further sup- 
port the observation that immune- and inflammation- 
related functions are involved in this cluster. During 
recent years, evidence of immune and inflammation sys- 
tems in psychiatric disorders has accumulated quickly 
[3,4,56]. 

In addition to the two major clusters, there are other 
crosstalk pairs that are noteworthy. The most interesting 
one is the pathway pair of cAMP-mediated signaling 
and G-protein coupled receptor signaling. The evidence 
linking these two pathways is strong, as its edge had a 
score 0.87. Moreover, these two pathways had the most 
significant enrichment test P values (6.31 x 10" 16 and 
5.01 x 10" 15 , respectively) in the IPA canonical pathway 
analysis (Table 2). The interaction between these two 
pathways involved 23 DEPgenes, including several sero- 
tonin receptor genes like HTR1A, HTR1B, and HTR5A. 
The cAMP-mediated signaling and G-protein coupled 
receptor signaling pathways have long been studied for 
their roles in the nervous system. Of note, there were 
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several crosstalk links between one of these two path- 
ways and other pathways that were enriched with the 
DEPgenes. Those pathway crosstalk connections were 
not shown in Figure 2 because they did not meet our 
stringent criteria for pathway inclusion (at least 3 DEP- 
genes shared between the pair of pathways or not within 
the top 10% crosstalk score, see the Materials and meth- 
ods section). One example is the link between the 
cAMP-mediated signaling pathway and the serotonin 
receptor signaling, both of which were significantly 
enriched with DEPgenes, but their crosstalk score fell 
outside of the top 10% in the score distribution. 

Our aim of the depression-specific subnetwork con- 
struction was to explore functional interactions of DEP- 
genes in a local protein-protein interaction environment. 
Our follow-up evaluation of the disease association of 
both DEPgenes and the additionally recruited genes 
using a major GWAS dataset for depression found that 
these genes tended to have small P-values (i.e., at the 
nominal significance level). Since the GWAS data we 
used here was an independent dataset, and GWAS was 
designed to be hypothesis free in genome-wide associa- 
tion studies, our survey of MDD-specific subnetwork 
genes demonstrated that this approach is efficient to 
find a set of genes that are both functionally interactive 
and enriched with the association signals of the corre- 
sponding disease. Therefore, this approach is not only 
promising to find novel disease candidate genes for 
future validation but also useful to study the disease at 
the systems biology level. 

This work has a few limitations. First, our DEPgenes 
and the follow up pathway/network analyses were con- 
ducted based on computational strategies. Although 
informative, this approach generally requires extensive 
experimental validation. Thus, although we validated 
subnetwork genes at the genome-wide level using the 
GWAS dataset, further validation of specific novel genes 
using more samples is urgently needed. Second, the 
pathway crosstalk analysis was based on the scores mea- 
sured by Jaccard Coefficient (/C) and Overlap Coeffi- 
cient (OC). In this study, we selected the pathway pairs 
empirically, that is, those ranked in the top 10%. P 
values from a statistic test would be better applied to 
select significant crosstalk. We did not apply this 
method because the Ingenuity Pathway Analysis system 
is a commercial software tool, and the information 
needed to conduct such a statistic test is not publically 
available. Accordingly, we could only use the limited 
information for pathway crosstalk analysis. Third, the 
MDD-specific subnetwork was built on available human 
interactome data. Although the number and quality of 
protein interactions has recently improved greatly, the 
human interactome is still incomplete with many false 
positives [24]. Additionally, subnetwork extraction relies 



on specific algorithms and corresponding parameters. 
Several algorithms exist for subnetwork extraction. In 
this study, we applied the Steiner minimum tree algo- 
rithm, which can effectively reduce unrelated nodes 
(genes) to be included, but it may also miss some 
important functional links. Our analysis, along with our 
recent application of this algorithm in other complex 
diseases (schizophrenia [5,57], hepatocellular carcinoma 
[37], and epilepsy [6]), has demonstrated this strategy is 
practical and could provide valuable information of the 
interactions among DEPgenes. 

Conclusions 

We developed a systems biology framework for 
advanced and functional analyses of major depressive 
disorder candidate genes. The network topological ana- 
lysis revealed similar network characteristics between 
depression and schizophrenia, but network characteris- 
tics of both depression and schizophrenia differed from 
cancer, consistent with previous clinical and genetic stu- 
dies. However, the depression genes interacted moder- 
ately stronger than schizophrenia genes in the context 
of the protein-protein interaction network. Our pathway 
enrichment tests followed by pathway crosstalk analysis 
revealed that neurotransmission and immune systems 
might play key roles in the etiology of depression, 
assuming that our evidence-based DEPgenes were repre- 
sentative of depression. Notably, we found two major 
functional clusters in the pathway crosstalk network. 
We further constructed a depression-specific subnet- 
work, in which additional candidate genes were identi- 
fied with enriched association signals using the 
depression GWAS dataset. These findings present a 
wealth of information for future validation. The frame- 
work we presented in this work can be applied to many 
other complex diseases, such as addiction and bipolar 
disorder. 
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